-
Notifications
You must be signed in to change notification settings - Fork 92
Convert common sub-functions as common sub-expressions #2788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Consider the benchmark using JuMP
f(x, u) = [sin(x[1]) - x[1] * u, cos(x[2]) + x[1] * u]
function RK4(f, X, u)
k1 = f(X , u)
k2 = f(X+k1/2, u)
k3 = f(X+k2/2, u)
k4 = f(X+k3 , u)
X + (k1 + 2k2 + 2k3 + k4) / 6
end
import Ipopt
function bench(n)
model = direct_model(Ipopt.Optimizer())
@variable(model, q[1:2])
@variable(model, u)
x = q
for _ = 1:n
x = RK4(f, x, u)
end
@constraint(model, x .== 0);
@objective(model, Min, u^2)
@time optimize!(model)
end
@time bench(4) When combined with jump-dev/JuMP.jl#4032, I now get:
All the time is actually being spent by
This could be fixed by checking the model at the same time as we do Another caveat is that we need to use MathOptInterface.jl/src/Utilities/copy.jl Line 181 in 1694a00
ruins everything. |
I fixed the performance issue of using JuMP
f(x, u) = [sin(x[1]) - x[1] * u, cos(x[2]) + x[1] * u]
function RK4(f, X, u)
k1 = f(X , u)
k2 = f(X+k1/2, u)
k3 = f(X+k2/2, u)
k4 = f(X+k3 , u)
X + (k1 + 2k2 + 2k3 + k4) / 6
end
import Ipopt
function bench(n)
model = Model(Ipopt.Optimizer)
@variable(model, q[1:2])
@variable(model, u)
x = q
for _ = 1:n
x = RK4(f, x, u)
end
@constraint(model, x .== 0);
@objective(model, Min, u^2)
@time optimize!(model)
end
bench(1)
bench(2)
bench(3)
@time bench(4) I get
|
There's a cost to adding subexpressions. I need to think very carefully if we should do this. It's not an obvious win. I've been procrastinating on this, not because it is technically difficult, but because I'm not sure if it is something that MOI should even do. Users can manually extract their subexpressions if they desire. I think at minimum, we're going to need a much larger set of benchmarks. |
I agree, it's not a clear choice. What convinced me to write this PR is the following. When users share sub-expressions by reference, we either:
The exponential blowup can make problems intractable. We have workarounds but most users won't think about them, they will just have their computer freeze. On the other hand, having more sub-expressions than necessary can at worse be a bit slower. And we're not inventing sub-expressions, the user already had them in their code, either unintentionally or for saving memory. Post-hoc detection (like what you did in https://github.com/lanl-ansi/MathOptSymbolicAD.jl) is complementary as it catches more sub-expressions but it doesn't prevent the blowup during construction. |
parent_node, | ||
) | ||
data.cache[arg] = (expr, length(expr.nodes)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we can tidy up all of this, but I'm not opposed to this part necessarily. We could be cleverer in how we do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this version should be working and tested so now it could be refactored :)
src/Utilities/functions.jl
Outdated
else | ||
parent.args[i] = MOI.Utilities.map_indices(index_map, arg) | ||
end | ||
end | ||
if !isnothing(nl_cache) | ||
nl_cache[f] = root | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part sucks. But we could work something out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the fact that we should add a keyword argument to the function ?
@@ -103,6 +103,9 @@ function MOI.get( | |||
) where {F,S} | |||
MOI.throw_if_not_valid(v, ci) | |||
f, _ = v.constraints[ci]::Tuple{F,S} | |||
if f isa MOI.ScalarNonlinearFunction | |||
return f | |||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is an issue that is quite hard to understand the implications of. We haven't exactly codified the contract of where functions get copied throughout the interface, because we assumed that it was cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One think we could do is to call MA.copy_if_mutable
. It should work since
julia> MOI.MA.mutability(MOI.ScalarNonlinearFunction)
MutableArithmetics.IsNotMutable()
julia> MOI.MA.mutability(MOI.ScalarAffineFunction{Float64})
MutableArithmetics.IsMutable()
I am happy if this works only in direct-mode. I need to think about this more. If we can build the tape fast, but it's slow to evaluate, then I'd consider writing a new AD backend that is structured as a complete DAG, rather than the current function-based approach. |
Do we have a benchmark showing that the subexpressions are slow ? I think you shared one at some point but I don't remember if it showed that it was slower or just comparable so not worth it. We can definitely split this PR in smaller ones. |
Follow up from jump-dev/JuMP.jl#4032.
At the moment, you can have a small model in terms of memory footprint of the
MOI.ScalarNonlinearFunction
. However, when you add it to the MOI level, the AD tape can be exponentially bigger.With this PR, the AD tape has the same memory footprint as the
MOI.ScalarNonlinearFunction
simply by interpreting the use of the same (in terms of the same pointer in memory, no expensive comparison is done) sub-functions as sub-expressions.What's a little bit tricky to handle is that the first time to see an expression, you're just going to add it to the tape so the second time you see it, you need to replace it in the tape as a subexpression and update the indices of variables after the tape.
Since the sub-expression is contiguous in the tape, it's fortunately not too hard to do.
It also prevents the copy of
MOI.ScalarNonlinearFunction
s, what's the catch there ?Closes jump-dev/JuMP.jl#4024